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SUMMARY 


Research  done  In  the  1960's  and  early  1970's  suggested  that  although  statis¬ 
tical  weights  and  subjective  weights  show  some  correspondence  in  regression¬ 
like  situations,  subjective  weights  tend  to  be  too  flat  by  comparison; 
statistical  weights  usually  show  that  some  attributes  are  quite  important, 
while  others  are  hardly  important  at  all.  More  recent  discussions  of  this 
literature,  however,  have  pointed  out  a  number  of  methodological  problems 
with  much  of  the  early  research,  and  have  reached  a  more  optimistic  conclusion 
with  respect  to  subjective  weights.  Several  experiments  support  the  more  re¬ 
cent  interpretation. 

The  present  study  compared  weight  estimation  procedures  for  additive,  riskless 
four-attribute  value  functions  with  linear  single-attribute  values. 
Self-explicated  (subjective)  weights  were  assessed  from  direct  subjective  and 
rank  order  estimates  of  attribute  importance;  observer-derived  weights  were 
determined  both  from  Indifference  judgments  (axiomatic  approach)  and  from 
holistic  evaluations  (statistical  approach)  of  alternatives.  Assessed  weights 
were  compared  to  a  "true"  weig.it  vector  usee*  to  generate  feedback  during  pre¬ 
assessment  learning  trials  (constructed  with  zero  inter-attribute  correlations). 
Although  self-explicated  weights  tended  to  be  flatter  than  observer-derived 
weights,  resulting  composites  correlated  equally  well  with  "true"  composites. 
Only  slight  differences  were  found  In  ordinal  correspondence  between  "true" 
and  assessed  weights. 
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INTRODUCTION 

Judgments  about  the  relative  desirability  of  acts  or  objects  are  in¬ 
herently  subjective.  They  depend  on  subjective  likelihoods  of  the  conse¬ 
quences  of  choosing  an  act  or  object,  on  subjective  values  for  these  conse¬ 
quences,  and  on  subjective  trade-offs  among  different  consequences.  Multi -attribute 
utility  analysis  (MAUA)  models  such  subjective  value  judgments  by  eliciting 
value  relevant  attributes  of  the  objects  or  acts,  by  assessing  single-attri¬ 
bute  utilities  and  weights,  and  by  aggregating  these  inputs  into  an  overall 


value  index.  Proponents  of  MAUA  argue  that  the  choices  dictated  by  MAUA  will, 
on  the  average,  yield  more  favorable  consequences  than  choices  based  on  other 
types  of  evaluations,  e.g.,  intuition.  However,  since  both  inputs  and  out¬ 
puts  of  MAUA  are  subjective  numbers,  and  since  the  consequences  of  any  choice 
are  subjectively  experienced,  researchers  have  faced  substantial  diffi¬ 
culties  in  validating  that  claim. 

In  this  paper  we  will  explore  a  validation  paradigm  based  on 
the  thesis  that  in  many  cases  value  is  simply  a  surrogate  for  probability. 

This  paradigm  allows  us  to  validate  the  MAUM  claim,  and  to  test  competing 
MAUA  procedures,  by  applying  evaluation  methods  in  situations  in  which 
probabilistic  relationships  between  choices  and  their  consequences  can  be 
ascertained.  One  need  only  compare  the  resultant  evaluations  of  choices 
(derived  from  various  MAUA  procedures  and  Intuition)  to  the  (known)  distri¬ 
bution  of  consequences  associated  with  each  alternative.  In  the  following 
we  will  discuss  the  conceptual  basis  and  an  operationalization  of  this 
paradigm  in  more  detail.  Subsequently,  we  will  describe  an  experiment 
which  • *lidated  *  >ur  MAUA  weighting  procedures  within  this  paradigm. 

in  r*  ,y  evaluation  problems,  the  relationship  between  value  and  proba 
bility  is  obvious:  A  "good"  applicant  for  graduate  school  is  likely  to 
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succeed  in  the  graduate  program;  a  "good"  credit  applicant  is  unlikely  to  default; 
a  "good"  scientific  manuscript  is  likely  to  be  accepted  for  publication 
in  a  prestigious  journal.  However,  in  every  one  of  the  examples  above, 
the  defining  characteristics  of  the  alternatives  are  probabilistically 
related  to  future  consequences  that  are  determined  once  the  choice  is  mad*?. 

In  most  cases,  degree  of  deservedness  (worth)  is  dependent  upon  the  alter¬ 
native's  likelihood  of  resulting  in  each  possible  consequence  (outcome 
state)  and  the  desirability  of  each  consequence. 

In  a  credit  granting  decision,  for  example,  the  outcome  states  might 
be  discrete  (such  as  default  vs.  no  default)  or  continuous  (such  as  the 
dollar  amount  of  profir  made  on  the  loan).  In  the  discrete  (dichotomous) 
case,  worth  is  often  considered  monotonic  to  the  likelihood  of  the  "good" 
outcome,  e.g.,  no  default,  while  in  the  continuous  case,  worth  is  normally 
thought  to  vary  monotonically  along  a  bipolar  continuum  from  "bad"  (e.g., 
substantial  dollar  loss)  to  "good"  (e.g.,  large  dollar  profit).  Thus,  an 
alternative  possesses  no  worth  or  "deservedness"  in  and  of  itself;  rather, 
worth  is  induced  upon  the  alternative  as  a  function  of  the  probabilistic 
relationship  between  alternative  characteristics  and  future  consequences. 

This  theoretical  position  is  widely  held  in  modern  psychology:  beliefs 
(probabilistic  relationships)  determine  affects  (worth  evaluations),  which 
in  turn  determine  behavior  (choices).  In  other  words,  what  we  think  influ¬ 
ences  what  we  feel,  and  what  we  feel  influences  what  we  do. 

Most  day  to  day  choices  are  made  from  evaluations  based  on  a  casual 
learning  of  the  relevant  probabilistic  relationship  between  alternatives 
and  consequences.  Indeed,  there  may  be  little  or  no  thought  given  to 
the  beliefs  and  affects  that  influence  choice.  Important  decisions,  such 
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as  those  listed  above  usually  require  accurate  evaluations,  which  in  turn 
are  best  obtained  by  a  precise  knowledge  of  the  probabilistic  relationship 
between  alternative  characteristics  and  outcome  states.  In  such  cases, 
prior  decisions  and  their  resulting  outcomes  may  be  scutinized.  If  the 
decision  is  important  enough,  and  if  a  suffic.ant  number  of  past  decisions 
a  id  consequences  have  been  documented  and  stored,  professional  learners 
(such  as  applied  statisticians,  management  scientists,  and  industrial  psycho¬ 
logists)  may  be  employed  to  use  complex  retrospective  techniques  for  uncover- 
inq  useful  probabilistic  relationships  between  alternative  characteristics 
and  outcome  states. 

For  many  important  decision  problems  (e.g.,  choosing  a  school  dese¬ 
gregation  plan)  there  is  very  little  or  no  documented  prior  experience. 

Even  when  many  past  observations  have  been  collected  and  stored,  the  proba¬ 
bilistic  relationship  may  prove  too  complex  for  traditional  post  hoc  analyses. 
Yet,  although  (normative)  belief  structures  can  not  be  explicated,  affect 
will  usually  pers  ist.  That  ii,  even  in  the  abserce  jf  explicit  relatio  iships 
between  alternative  characteristics  and  consequences,  various  properties  of 
the  alternatives  will  be  viewed  as  more  or  less  desirable  or  worthy  than 
other  properties.  Unlike  the  probabilistic  relationships,  which  may  be  dis¬ 
covered  by  analyzing  the  environment,  affect  structures  can  only  be  expli¬ 
cated  by  studying  the  decision  maker(s). 

We  used  an  operationalization  of  this  validation  strategy  (c.f..  Pearl,  Note 
1),  tested  by  John  and  Edwards  (Note  2)  and  similar  to  that  utilized  by 
Schmidt  (1978),  to  compare  weight  estimation  procedures  for  additive,  risk¬ 
less  four-attribute  value  functions  with  linear  single  attribute  values. 
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Estimated  weights  were  compared  to  the  "true"  weights  in  the  "artificial 
environment  of  choice-reward",  i.e.,  in  the  linear  model  used  to  generate 
outcome  feedback. 

Of  central  interest  is  the  performance  of  client  explicated  methods 
(such  as  rank  weights,  subjective  [raf'j]  estimation,  and  constant  sum) 
relative  to  so  called  observed  derived  methods  (such  as  pricing-out,  trading- 
off  to  the  most  important  dimension,  regression  weights,  and  ANOVA  weights 
derived  from  an  orthogonal  design).  (For  reviews  of  the  client  explicated  vs.  ob 
server  derived  distinction,  see  Fischer,  1975,  1979;  Huber,  1974a,  1974b;  Johnson 
Huber,  1977.)  All  client  explicated  approaches  assume  an  additive  model 
form  and  depend  upon  direct  subjective  estimates  of  all  parameters,  inclu¬ 
ding  weights.  Subjective  estimation  techniques  determine  scale  values  of 
attributes  on  a  iimension  of  "importance  in  determining  the  overall  con¬ 
struct  of  evaluation".  These  scale  values  are  called  weights. 

In  contrast,  observer  derived  approaches  typically  rely  on  (holistic) 
judgr-ev.s  that  relate  directly  tc  the  relative  standing  of  sncr.e  subset  of 
choice  alternatives  on  the  construct  of  evaluation.  Proposed  aggregation 
rules  are  accepted  only  if  the  holistic  judgments  do  not  indicate  violations 
of  axioms  or  rejection  of  statistical  hypotheses  necessary  for  the  model 
representation.  Each  holistic  judgment  can  be  thought  of  as  representing 
one  equation  with  some  number  of  unknowns,  depending  upon  the  complexity 
of  th°  acropted  model  form.  In  general,  axiomatic  procedures  require  a 
number  of  holistic  judgments  (equations)  equal  to  the  number  of  unknowns, 
and  the  parameter  values  (including  weights)  can  be  thought  of  as  simply 
the  solution  to  a  set  of  simultaneous  equations.  Often,  independent  sets 
of  holistic  judgments  (equations)  are  obtained,  and  the  solution  parameters 


from  each  are  compared.  This  is  called  sensitivity  analysis.  On  the  other 
hand,  statistical  procedures  usually  require  a  much  larger  number  of  holistic 
judgments  (equations)  than  unknowns.  Here,  each  judgment  (equation)  contains 
an  error  term,  and  parameter  values  are  usually  the  critical  point  (minimum) 
of  a  loss  function  (such  as  least  squares)  defined  over  the  errors.  The 
sensitivity  of  statistical  models  is  often  gauged  by  the  errors  of  estimate 
of  the  parameters. 

Over  twenty  years  after  Paul  Hoffman's  (1960)  seminal  work  on  the  corre¬ 
spondence  of  subjective  (self-explicated)  and  statistical  (observer  derived) 
weights,  there  is  little  consensus  as  to  whether  weights  should  be  "constructed 
via  direct  assessments  of  importance.  A  very  influential  review  by  Slovic  and 
Lichtenstein  (1971)  set  the  tone  for  much  of  the  research  for  the  past  ten 
years,  and  their  conclusions  have  been  echoed  by  researchers  across  a  diverse 
literature:  management  science  (Zeleny,  1976,  p.  14);  attitude  theory  (Fishbei 
and  Ajzen,  1972,  p.  501;  1975,  p.  159);  verbal  reporting  on  mental  processes 
(Nisbett  and  Wilson,  1977,  p.  254).  Early  results  suggested  that  although 
statistical  weights  and  subjective  weights  show  some  correspondence  in  regres¬ 
sion-like  situations,  subjective  weights  tend  to  be  too  flat  by  comparison; 
statistical  weights  usually  show  that  some  attributes  are  quite  important, 
while  others  are  hardly  important  at  all.  More  recent  discussions  of  this 
literature,  however,  have  pointed  out  a  number  of  methodological  problems 
with  much  of  the  early  research,  and  have  reached  a  more  optimistic  conclusion 
with  respect  to  subjective  weights  (Schmitt  and  Levine,  1977;  John  and  Edwards, 
Note  3).  Several  experiments  support  the  more  recent  interpretation  (Brehmer 
and  Qvamstrom,  1976;  Seism' tt,  1978;  John  and  Edwards,  Note  2). 
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Method 

Overview  and  Independent  Variables 

Forty-six  college  studnets  were  taught  a  four  attribute  MAU  model  of 
diamond  worth  using  the  paradigm  of  multiple  cue  probability  learning  and 
outcome  feedback;  after  training,  subjects  assessed  MAU  weight  parameters 
via  a  variety  of  elicitation  techniques.  Although  all  subjects  saw  diamond 
profiles  and  outcome  feedback  with  similar  multivariate  distributions  (equal 
attribute  variances  and  means,  zero  intercorrelations  among  attributes,  and 
weight  parameters  in  the  ratio  of  8:4:2:1),  three  task  variables  thought  to 
affect  learning  were  manipulated.  Monetary  payoff  was  manipulated  by  telling 
half  of  the  subjects  that  they  could  earn  up  to  $10.00  in  cash,  the  exact 
amount  depending  upon  their  performance  during  the  experiment.  The 
other  half  were  given  no  monetary  incentive.  Task  uncertainty  was  set  at 
one  of  two  levels;  half  of  the  subjects  received  small  random  error  In  the 
diamond  worth  feedback  (IX  of  total  variance),  while  the  other  half  received 
larger  random  error  (18%  of  total  variance).  Exposure  to  the  MAU  model  was 
manipulated  by  varying  the  tor.al  number  of  learning  trials.  Ka'f  of  the 
subjects  were  trained  for  120  trials,  while  the  other  half  completed  only 
60  trials.  Immediately  after  model  training  every  subject  made  several 
independent  assessments  of  attribute  importance.  These  elicitation  tech¬ 
niques  will  be  described  in  detail  subsequently. 
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Subjects 

Forty-six  students  (26  males  and  20  females)  were  selected  from  a  much 
larger  pool  of  volunteers  from  an  Introductory  Psychology  course.  The 
criteria  for  selection  were  scores  of  at  least  600  (males)  or  550  (females) 
on  the  mathematical  aptitude  section  of  the  SAT,  and  a  requirement  that 
all  subjects  be  whites  whose  native  tongue  was  English  --  the  latter  re¬ 
quirement  because  the  experimental  stimuli,  hypothetical  diamonds,  relate 
to  cultural  mores. 

Subjects  were  run  either  Individually  or  In  groups  of  up  to  six. 

Each  session  lasted  1  to  2  hours.  Subjects  received  some  payment  (see 
below)  and  experimental  credit  in  fulfillment  of  a  course  requirement. 

Training  Procedure 

Each  subject  sat  in  front  of  a  computer  terminal  with  a  CRT  display. 

A  written  set  of  instructions  said  that  subjects  were  to  learn,  via  computer 
assisted  instruction,  a  manner  In  which  diamonds  are  appraised.  Diamonds 
are  evaluated  on  the  basis  of  cut,  color,  clarity,  and  carat  weight.  The 
instructions  explained  these  dimensions  in  considerable  detail,  and  asserted 
(incorrectly)  that  any  diamond  can  be  described  as  a  profile  of  four  numbers, 
each  between  0.0  and  10.0,  representing  the  diamond's  rating  on  the  four 
attributes.  Value  Increases  with  rating  on  each  dimension. 

During  the  experiment,  the  CRT  would  display  a  profile  of  four  labelled 
numbers.  The  prompt  "PRICE?"  then  appeared,  and  the  subject  entered  a  dollar 
estimate  on  the  keyboard.  Then  the  CRT  displayed  the  "true"  price  of  that 
diamond,  the  signed  difference  between  "true"  price  and  the  subject's  estimate. 
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and  a  standardized  error  score  calculated  as  follows: 

Error  Score  *  ^^^equal  ~  (Error)  (i) 

2 

E(MSE)g^ua^  is  the  expected  mean  squared  error  using  equal  weights,  (Error) 
is  the  squared  deviation  of  the  subject's  estate  from  the  feedback,  and 
E(MS£)bfita  is  the  expected  mean  squared  error  using  the  optimal  beta  weights. 

The  instructions  explained  that  a  score  of  1  is  excellent,  a  score  of  0  is 
very  poor,  and  that  scores  above  1  or  below  0  are  possible  but  very  in¬ 
frequent.  Subjects  also  recorded  on  paper  any  errors  they  detected  in  the 
feedback  about  the  difference  between  estimated  and  true  value;  these  are 
terminal  errors.  The  few  such  instances  were  later  corrected  by  editing. 
Stimulus  Generation 

The  ratings  came  from  uniform  distributions  over  the  0  to  10  range 
on  each  dimension.  Consequently  the  expected  value  for  each  attribute  was 
5.0,  and  its  standard  deviation  was  2.9.  The  expected  intercorrelation 
between  any  pair  of  attributes  was  0.  The  same  set  of  120  diamond  profiles 
were  presented  in  the  same  order  to  all  subjects  who  saw  120  profiles;  those 
who  saw  only  60  profiles  saw  the  first  60  of  those.  Sample  statistics  by 
30-trial  blocks  for  all  stimuli  are  acceptably  close  to  their  population  values. 

Outcome  feedback  was  calculated  from  the  following  model: 

True  Price  -  320(C1 )  +  160(C2)  +  80(C3)  +  40(C4)  +  k(N(0,l ) )  (2) 

In  Equation  2,  is  the  rating  on  the  1th  dimension,  Jc  is  a  constant  that 
determines  the  precision  of  the  model,  and  N(0,1)  is  standardized  normal 
random  error.  The  values  Jc  *  100  and  Jc  ■  500  were  used  for  different  groups 
of  subjects.  The  expected  value  of  the  true  price  is  $3000;  its  standard 
deviation  is  1069  if  J^  ■  100  and  1176  if  Jc  «  500.  Consequently,  the 
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expected  squared  multiple  correlation  between  the  true  price  and  the  four 
attributes  was  .99  for  k  *  100  and  .82  for  k  ■  500.  Four  different  assign¬ 
ments  of  attribute  labels  to  weights  were  devised,  and  one  was  chosen  ran¬ 
domly  for  each  subject. 

Post-Learning  Weighting  Judgments 

Upon  completion  of  the  learning  trials,  the  subject  went  individually 
to  another  room,  and  received  a  seven  page  self-administered  booklet  for 
weight  assessments.  The  experiment  asked  the  subject  to  read  the  Instructions 
at  the  top  of  each  page,  and  to  ask  any  questions  before  starting  work  on 
that  page.  The  order  of  assessment  procedures  was  identical  for  all  subjects. 
No  subject  could  change  previous  responses  after  turning  a  page. 

Bootstrapping.  Raw  regression  weights  were  obtained  by  standard  least 
squares  regression  analysis  of  each  subject's  responses  over  the  last  30 
learning  trials. 

Ranking.  The  subject  simply  rank  ordered  the  four  attributes  from  most 
to  least  Impo-tart  In  detenrininc  price. 

Most  Important  dimension.  The  subject  identified  a  most  Important  dimen¬ 
sion,  and  assigned  a  percentage  that  represented  its  Importance  in  determining 
price.  The  instructions  said  that  the  ratio  of  the  assigned  percentage  to 
100  minus  that  percentage  represented  the  ratio  of  the  Importance  of  the  most 
important  attribute  to  the  total  combined  Importance  of  the  other  three  attri¬ 
butes. 

Constant  sum.  The  subject  distributed  100  points  across  the  four  attri¬ 
butes,  according  to  Importance.  The  Instructions  said  only  that  more  im¬ 
portant  attributes  should  receive  higher  percentages. 
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Ratio  estimation.  First  the  subject  once  more  ranked  the  attributes 
in  order  of  decreasing  Importance.  The  least  Important  dimension  was 
assigned  a  weight  of  10,  and  the  subject  provided  weights  for  the  other 
three  dimensions  using  that  weight  as  an  anchor.  The  instructions  said 
that  the  ratio  of  any  given  pair  of  weights  should  reflect  the  number  of 
times  more  important  one  attribute  is  than  the  one  with  which  It  Is  being 
compared.  (This  is  the  response  mode  Edwards  [1977]  proposed  In  his  SMART 
procedure.) 

Pricing  out.  The  subject  was  told  to  imagine  that  he  or  she  possesses 
$3000  In  cash  and  a  diamond  that  scores  (0,  0,  0,  0)  --  worst  possible  scores 
on  all  four  dimensions.  For  each  dimension,  the  subject  states  how  much  he  or  she 
would  be  willing  to  pay  In  order  to  exchange  that  diamond  for  one  that  scores  10  on 
that  dimension  and  0  on  the  other  three.  (For  details,  sse  Keeney  &  Raiffa,  1976,p.l25J| 

I 

Trading  off  to  the  most  Important  dimension.  The  subject  once  more 
identifies  the  most  Important  dimension.  For  convenience  of  exposition, 
suppose  that  Is  the  first  one  listed.  Then  the  subject  must  specify  a  value 
of  x  such  that  diamonds  (x,  0,  0,  0)  and  [0,  10,  0,  0)  are  equivalent  in 
price.  This  judgment  must  be  made  four  times,  once  for  each  dimension.  Of 
course,  when  the  most  Important  attribute  Is  set  to  10,  the  two  diamonds  will 
be  Identical;  this  judgment  was  used  to  make  sure  the  subject  understood  the 
Instructions.  (Again,  for  details,  see  Keeney  &  Raiffa,  1976,  p.  121.) 

Holistic  Orthogonal  Parameter  Estimation  (HOPE).  HOPE  simply  required 
the  subject  to  appraise  17  diamonds  holistically.  The  set  of  17  diamonds 
is  carefully  chosen  so  that  parameters  can  be  recovered  from  the  judgnents. 

(The  HOPE  procedure,  developed  by  Barron  and  Person  [1979]  ,  is  closeV  akin 
to  standard  fractional  replication  ANOVA  designs.) 
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Results 

MAU  Model  Learning 

The  lens  mwdel  Index  of  matching  (G)  is  the  correlation  between  com¬ 
posites  derived  from  consistent  application  of  the  weights  used  to  generate 
outcome  feedback  and  the  weights  derived  statistically  from  subjects' 
holistic  diamond  appraisals.  Thus.  G  (often  called  "knowledge",  appropri¬ 
ately  enough).  Is  a  measure  of  the  extent  to  which  the  subject's  combination 
rule  (weight  vector)  corresponds  to  that  of  the  "true*  model  In  creating 
composites.  For  the  specific  MAU  model  we  taught  our  subjects,  the  corre¬ 
lation  between  composites  from  different  sets  of  weights  Is  directly  related 
to  the  parameters  of  the  bivariate  distribution  describing  the  weights.  When 
all  attribute  correlations  are  zero  and  the  attribute  variances  are  equal, 
the  correlation  between  composites  from  subject's  weights  and  from  true 
weights  Is  given  by  the  following  formula  (Gulliksen,  1950,  p.  319):1 


rst  (<yi)  (ot/!)  +  1 
J  1  +  "  (o^t)2 


(2) 


(where  s  and  t  are  the  subject's  and  "true"  weighting  schemes,  respectively, 

and  X$  and  are  the  composite  evaluations  resulting  from  them).  Equation  2 

was  used  to  calcualte  matching  scores  for  every  subject  for  each  block  of 

2 

30  learning  trials. 

Figure  1  shows  average  G  scores  as  a  function  of  number  of  trials  and 
k.  (Payoff  or  Its  absence  make  no  difference  In  the  data.)  Figure  1  shows 
a  significant  Increase  In  matching  from  the  first  trial  block  to  the  second 
(F(l ,38)**15.39,  p  <  .05).  This  result  also  holds  for  the  60  trial  subjects 


12'  M  .99,  120  Trials 
11)  #.82,  120  Trials 

11 )  □  .99,  60  Trials 

12) 0.82,  60  Trials 


I 


i 


2  3 

TRIAL  BLOCK 
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con  side  red  separately  (F(l ,19)»4.82,  p^.05),  and  across  all  four  trial 
blocks  for  the  120  trial  subjects  (F(3,57)«10.79,  p<.05).  There  Is  no 
significant  Increase  In  performance  across  the  last  trial  blocks  for  the 
120  trial  subjects  (F(2,38*2.29,  p  >.05).  Subject's  learning  about  the 
weights  is  v  rtually  complete  by  about  trial  #30.  For  both  subjects  who 
received  payoffs  and  thosewho  did  not,  the  combination  of  little  task  un¬ 
certainty  and  an  expectancy  of  many  learning  trials  produced  very  poor 
performance  in  the  first  30  trials. 

Weight  Assessments 

Subjects'  knowledge  of  the  weights  after  the  first  trial  block  Is  not 
mediated  by  monetary  payoffs,  task  uncertainty,  or  the  number  of  learning 
trials  completed  (see  Figure  1);  thus,  we  have  collapsed  weight  assessments 
across  all  three  task  manipulations.  Whether  or  not  the  subject  assigned 
the  largest  weight  to  the  most  Important  attribute  and  whether  or  not  he/ 
she  assigned  weights  In  the  correct  rank  ordering  are  good  Indications  of 
weight  correspondence.  The  number  of  subjects  who  correctly  indicated  the 
most  important  dimension  (ties  not  counted)  and  the  number  who  indicated 
the  correct  rank  ordering  (Including  at  most  1  tie)  are  shown  In  Table  1 
for  each  of  the  seven  assessment  techniques. 

Subjects  most  often  correctly  identified  the  most  Important  dimension 
using  the  ratio  technique  and  most  often  Indicated  the  correct  rank  ordering 
using  the  bootstrapping  method.  However,  there  were  no  significant  differ- 
ences  on  either  of  these  measures  (X  (6)  ■  8.18,  p  > .05  and  X*(6)  *  2.28, 
p^.05,  respectively).  A  more  sensitive  measure  of  correspondence  Is  the 
number  of  Inverted  attribute  pair  orders  (a  linear  transformation  of 
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TABLE  1 
Weight  Orders 


Assessment 

Technique 


#  of  Ss  #  of  Ss  with 

Correctly  Identifying  <1  Inversions  with 
Host  Important  Dimension  True  Weights 


Mean  #  of 
Inversions  with 
True  Weights 


Bootstrapping 

35 

16 

1.06 

Ranking 

35 

7 

1.37 

Constant  Sum 

32 

11 

1.44 

Ratio 

36 

9 

1.42 

Pricing-Out 

34 

12 

1.41 

Trading-Off 

33 

8 

1.73 

HOPE 

30 

7 

1.68 
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Kendall's  T).  The  mean  number  of  such  inversions  for  each  technique  is  also 
shown  in  Table  1.  The  fewest  inversions  resulted  from  the  bootstrapping 
weights  while  trading-off  to  the  most  important  dimension  and  HOPE  produced 
the  most  Inversions.  The  mean  number  of  inversions  was  significantly  differ¬ 
ent  across  assessment  procedures  (F(6,228)»3.09,  p<.05).  Well  over  90X 
of  the  subjects  yielded  3  or  fewer  inversions  for  all  of  the  obtained  attri¬ 
bute  orderings.  Furthermore,  all  of  the  cumulative  distributions  of  inver¬ 
sions  are  significantly  different  from  that  expected  if  subjects  were  simply 
providing  random  orderings  (by  the  Kolmogorov  goodness  of  fit  test,  p<.05). 

In  addition  to  assigning  weights  in  the  correct  rank  ordering,  we  would 
like  subjects  to  spread  the  weights  appropriately.  One  good  indication  of 
the  weight  spread  Is  the  ratio  of  the  weight’ assigned  to  the  most  important 
dimension  to  the  sum  of  the  weights  assigned  to  the  remaining  three  dimensions. 
Since  a  log  transformation  of  this  ratio  is  essentially  linear  with  the  nor¬ 
malized  weight  assigned  to  the  most  important  dimension,  we  have  elected 
simply  to  use  th*  normalized  weights  For  four  dimensions,  specification 
of  the  weight  on  the  most  Important  dimension  severely  restricts  the  vari¬ 
ance  the  range  of  the  weight  vector.  Of  course,  what  constitutes  an  appro¬ 
priate  weight  on  the  most  important  dimension  depends  upon  whether  the  sub¬ 
ject  correctly  Identified  the  most  important  dimension  or  not.  If  he/she 
did,  then  the  optimal  weight  is  53.3;  if  some  other  attribute  receives  a 
higher  weight  than  the  "true"  most  Important  dimension,  flatter  weights  are 
better  than  more  extreme  ones,  l.e. ,  the  closer  to  25,  the  better.  Table  2 
displays  mean  maximum  weights  for  each  assessment  technique  conditional  upon 
those  subjects  who:  correctly  identified  the  most  Important  dimension  (see 
Table  1,  column  a,  for  sample  sizes);  correctly  identified  the  most  important 
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TABLE  2 

Mean  Weights  on  the  Most  Important  Dimension  (MID) 


_ S  Correctly  Identified  MID _  S  Incorrectly  Identified  MID 

Assessment  Correct  for  Correct  for  *  *?Uh  Weight  on  Ss  MID 

Technique  EACH  Technique  ALL  Techniques  Weight  <53 

N=16 


(a) 

(b) 

‘  (c) 

(d) 

Bootstrapping 

52.3 

55.2 

57* 

32.6 

Direct  Assess 
of  MID 

43.6 

43.9 

88* 

39.0 

Constant  Sum 

41.9 

43.9 

91* 

36.3 

Ratio 

41.6 

42.1 

100* 

41.8 

Pricing-Out 

42.2 

42.2 

94* 

44.5 

Trading-Off 

46.3 

51.7 

79* 

38.8 

HOPE 


48.3 


51.6 


80* 


39.9 
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dimension  for  all  seven  assessment  techniques  (N  =  16, column  b);  incorrectly 
identified  the  most  important  dimension  (sample  sizes  are  46  minus  the 
sample  sizes  for  (a),  colunn  d).  In  addition,  the  percentage  of  those  sub¬ 
jects  correctly  Identifying  the  most  important  attribute  (column  a)  who  gave 
weights  less  than  the  optimal  value  (53.3)  is  shown  in  column  (c). 

In  general,  all  of  the  weighting  techniques,  with  the  exception  of 
bootstrapping,  underestimated  weights  to  the  correctly  identified  most 
Important  dimension.  HOPE  and  trading-off  to  the  most  important  dimension 
tended  to  provide  more  extreme  weights  on  the  correctly  identified  most  important 
dimension  then  did.  the  remaining  four  assessment  techniques.  A  repeated  measures 
analysis  of  variance,  hot  including  the  3  task  manipulations,  was  run  over 
the  16  subjects  who  correctly  identified  the  most  important  dimension  on 
all  seven  assessments.  The  means  in  column  (b)  were  found  to  be  signifi¬ 
cantly  different  from  one  another  ((F(6,90)=4.24,  p<\05).  A  comparision  of 
columns  (a)  and  (b)  suggests  that  mean  weights  on  the  most  Important  dimen¬ 
sion  are  larger  for  those  subjects  who  correctly  identified  the  most 
important  dimension  for  all  assessments  than  for  those  who  did  so  for  only 
a  subset  of  them.  Comparing  column  (a)  with  column  (d)  suggests  the  pleas¬ 
ant  finding  that  subjects  who  did  not  know  the  most  important  dimension 
assigned  flatter  weights. 

The  results  of  this  analysis  suggest  that  bootstrapping  weights  are 
best  in  terms  of  producing  both  the  correct  rank  ordering  among  the  attri¬ 
butes  and  the  correct  weight  magnitudes.  HOPE  and  tradlng-^f  to  the  most 
Important  dimension  are  better  than  average  in  terms  of  magnitude  or  spread, 
but  are  the  poorest  at  generating  the  correct  rank  ordering.  Of  course, 
these  two  effects  will  tend  to  cancel  each  other.  All  of  the  other  techniques 
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produce  highly  similar  orderings  and  spreads.  Thus,  it  is  not  surprising 
that  correlations  between  composites  (calculated  from  Equation  2)  from  the 
subjects'  weights  and  true  model  weights  (assuming  equal  expected  variances, 
all  zero  intercorrelations,  and  the  expected  OLS  regression  weights)  show 
little  differentiation.  Average  correlations  range  from  .88  for  trading- 
off  and  pricing-out  to  .92  for  bootstrapping  weights.  These  slight 
differences  were  not  significant  (F(5,190)=1.88,  p^.Q5).  Neither  is  it 
surprising  that  correlations  between  composites  from  the  subject's  bootstrapping 
weights  and  various  other  subjective  weights  demonstrate  no  differences  (F(4,152)* 
1.18,  p  >  .05).  Mean  correlations  with  bootstrapping  range  from  .89  for  pricing- 
out  to  .92  for  HOPE.  This  overall  level  of  performance  is  quite  good,  con¬ 
sidering  that  equal  weights  produce  a  composite  correlation  of  only  .81  and 
extreme  weights  (using  the  most  important  dimension  only)  yield  a  composite 
correlation  of  .87  with  the  true  weights, 
rank  Weighting 

Four  sets  of  rank  weights  were  generated  fror.  each  subject's  rank 
ordering  of  the  attributes;  two  are  designed  so  that  the  weight  on  the  most 
important  dimension  matches  that  directly  assessed  by  the  subject.  Rank- 
sum  weights  are  a  linear  transformation  of  the  ranks  and  rank-reciprocal 
weights  are  proportional  to  the  reciprocals  of  the  ranks.  Decision- 
rule  rank  weights  are  determined  by  comparing  the  subject's 
directly  assessed  weight  on  the  most  important  dimension  to  the  weight  on 
the  most  important  dimension  for  rank-sum,  rank-reciprocal,  and  equal 
weights.  That  rank  weighting  procedure  producing  the  least  discrepant 
weight  on  the  most  important  dimension  yields  the  decision-rule  rank 
weights*  Rank-exponent  weights,  proposed  by  Stillwell  and  Edwards  (Note  5), 
are  determined  from  Equation  3: 
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z  K  z 

W,  «  (K  +  1  -  RJZ  /  I  RZ.  (3) 

1  1  j«l  J 

(W^  is  the  normalized  weight  on  the  Uh  dimension,  R^  Is  the  subjects' 
ranking  of  the  jth  dimension,  and  jc  is  the  number  of  dimensions.)  By 
substituting  the  elicited  value  of  the  weight  on  the  most  impor¬ 
tant  dimension  for  In  Equation  3,  z  ‘is  easily  determined  by  iterative 
numerical  methods  to  any  degree  of  accuracy  desired. 

For  the  "true"  MAU  model  we  used,  all  four  rank  weighting  schemes  can 
potentially  perform  quite  well.  A  subject  who  yields  the  correct  rank 
ordering  of  the  attributes  (zero  inversions)  would  obtain  a  correlation  with 
the  true  weight  composites  of  .97  for  rank  sun  weights  (40,30,20,10) 
and  .99  for  rank  reciprocal  weights  (48 ,24,16,12) .  4s  we  saw  in  Table  1, 
the  direct  ranking  procedure  was  quite  good  in  providing  nearly  correct  rank 
orderings  of  attributes;  thus,  it  is  not  surprising  that  the  average  rank 
sun  and  reciprocal  correlations  were  .92  and  .95,  respectively.  Had  a  sub¬ 
ject  not  only  provided  the  correct  rank  ordering,  but  also  thf»  cor-ect 
directly  assessed  weight  on  the  most  important  dimension  (53.3),  the  decision 
rule  rank  weights  would  have  been  the  same  as  the  rank  reciprocal  weights; 
rank  exponent  weights  under  these  conditions  (53.3,30.0,13.3,3.3)  yield  a 
correlation  very  close  to  1.0.  Since  the  directly  assessed  weights  to 
the  most  Important  dimension  were  underestimated  (Table  2),  it  is  also 
not  surprising  that  decision  rule  rank  weights  and  rank  exponent  weights 
performed  no  better  than  the  rank  weights  not  utilizing  the  directly  assessed 
weight  to  the  most  important  dimension.  Correlations  between  composites 
weighted with  true  weights  and  with  decision  rule  rank  and  rank  exponent 
weights  were  .93  and  .92,  respectively. 


We  have  show,  that  the  directly  assessed  weight  of  the  most  Important 
dimension  Is,  like  that  for  most  of  the  other  techniques,  underestimated. 
However,  one  Issue  concerning  rank-exponent  and  decision-rule  rank  weights 
is  the  degree  to  which  directly  assessing  the  weight  on  the  most  Important 
dimension  Is  even  possible.  One  critical  question  concerns  the  degree  to 
which  the  direct  assessment  will  correspond  to  assessments  Involving  all 
dimensions.  An  ordinal  analsyis  Is  presented  In  Table  3  showing  the  fre¬ 
quencies  of  subjects  estimating  the  weight  to  the  most  Important  dimension 
(direct  assessment)  less  than,  greater  than,  and  within  5%  of  the  weight 
estimate  provided  by  the  other  six  elicitation  procedures.  Recall  that 
the  constant  sum  technique  immediately  followed  the  direct  assessment  of 
the  weight  of  the  most  Important  dimension  and  that  the  response  modes 
both  required  an  estimate  In  terms  of  a  percentage  of  100.  Somewhat 

surprisingly,  13  subjects  changed  their  estimates,  with  11 
choosing  to  assign  fewer  points  in  the  constant  sum  method.  Thus,  subjects 
reassessed  already  too  *lat  weights  as  even  flatter  when  asked  to  provide 
weights  to  the  other  three  dimensions. 

Discussion 

All  of  the  weight  assessment  techniques  we  studied  yielded  weights 
corresponding  to  the  "true"  weights  to  about  the  same  degree.  No  signi¬ 
ficant  differences  in  the  correlation  among  composites  were  evidenced  In 
our  comparlslon  of  holistic  procedures  (bootstrapping  and  HOPE),  Indiffer¬ 
ence  procedures  (trading-off  to  the  most  Important  dimension  and  pricing- 
out),  direct  subjective  estimates  (method  of  constant  sum  and  magnitude 
estimates  with  ratio  Instructions),  and  arithmetic  transformations  of  rank 
orders  (rank-sum,  rank- reciprocal,  rank-exponent,  and  decision-rule  rank 
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TABLE  3 

Direct  Assessment  of  Weight  on  the  Most  Important  Dimension  (MID) 


#  of  Subjects 

Direct  Assessment 
Greater 

Direct  Assessment 
Less 

Equal  +2.5 

Changed 

MID 

Assessment 

Technique: 

- 

Bootstrapping 

7 

19 

2 

17 

Constant  Sum 

11 

2 

28 

3 

Ratio 

10 

13 

21 

1 

Pricing  Out 

16 

13 

9 

6 

Trading  Off 

13 

14 

13 

5 

HOPE 

4 

20 

5 

.  16 
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techniques).  All  of  these  procedures  substantially  outperformed  equal 
weighting  and  somewhat  outperformed  extreme  weighting.  Subjects  exhibited 
knowledge  of  the  "true"  weighting  scheme  beyond  simply  knowing  that  all 
attributes  are  related  to  overall  price  (i.e.,  equal  weighting)  or  that  one 
attribute  is  highly  related  to  overall  price  (i.e.,  extreme  weighting). 

These  results  replicate  those  reported  by  John  and  Edwards  (Note  2). 

None  of  the  more  complicated  weighting  procedures  performed  any  better 
than  the  simple  technique  of  directly  assessing  the  rank  ordering  and  arith¬ 
metically  transforming  the  ranks  into  weights.  Although  this  might  suggest 
that  subjects'  weight  assessments  contain  no  more  useful  Information  beyond 
that  embodied  in  their  rank  ordering  of  the  attributes,  we  must  be  cautious. 

The  true  weight  ratios  chosen  for  this  experiment  (8:4:2:1)  along  with  the 
attribute  structure  (4  attributes,  zero  Intercorrelations,  and  equal  vari¬ 
ances)  provide  an  ideal  setting  for  rank  weights.  That  is,  a  correct  rank 
ordering  produces  a  minimum  correlation  among  composites  of  .97  for  the  rank 
transformations  suggested  by  StlllweM  and  Edwards  (Note  5).  In  short,  after 
ranks  are  known,  there  is  little  room  for  Improvement.  Of  course,  in  the 
absence  of  analytical  work,  we  have  no  way  of  assessing  the  general Izabllity 
of  this  example. 

That  rank  weights  outperformed  equal  weights  is  an  Important  replication 
of  a  somewhat  surprising  finding  by  John  and  EAvards  (Note  2).  Although  rank 
weighting  procedures  for  MAUA  have  been  extensively  studied  for  at  least 
fifteen  years  (e.g.,  Eckenrode,  1965;  Permut,  1973),  earlier  results  had  sug¬ 
gested  no  differences  between  rank  and  equal  weights  (e.g.,  Beckwith  &  Lehmann, 
1973;  Ells  &  John,  1980;  Elnhorn  &  McCoach,  1977;  Lehmann,  1971)  or  inferior 
performance  by  rank  weights  (e.g.,  Nevenan,  1977). 
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In  addition  to  the  main  findings  cited  above,  four  other  specific  results 
are  noteworthy.  First,  we  found  that  model  weights  were  learned  In  somewhat 
fewer  than  60  trials,  probably  between  20  and  40.  The  rate  at  which  subjects 
learned  the  model  was  altered  by  the  combination  of  task  uncertainty  and 
number  of  learning  trials.  Specifically,  subjects  who  expected  to  see  many 
trials  (120)  and  whose  outcome  feedback  was  relatively  certain  ( 1%  variance 
unaccounted  for  by  the  diamond  profile)  learned  weights  at  a  slower  pace  than 
did  other  subjects.  The  final  levels  of  weight  knowledge  observed  were  not 
medl&txl  by  any  of  the  task  variables.  For  the  number  of  learning  trials 
variable,  a  smaller  value  (less  than  40  or  so)  would  be  needed  to  produce 
any  potent  manipulation  for  the  four  attribute  task  situation  we  studied. 
Monetary  payoffs  did  not  effect  final  levels  of  weight  knowledge,  probably 
for  one  or  both  of  two  reasons:  Cl)  Most  subjects  did  not  seem  to  care  about 
such  a  "small"  amount  of  money($10.00);  and  (2)  Many  subjects  commented  that 
they  found  the  "diamond  appraisal"  task  quite  Interesting  and  stimulating. 

Bot)  yf  thc.se  casual  observations  fit  our  stereotype  of  ’JSC  undergraduates. 

The  lack  of  any  main  effect  for  task  uncertainty  is  an  Important  finding. 
Subjects  were  able  to  learn  and  accurately  report  weights  in  a  task  environ¬ 
ment  in  which  18%  of  the  variance  was  not  accounted  for  by  the  five  attri¬ 
butes.  In  real  world  settings,  much  of  the  variance  In  overall  alternative 
value  Is  often  not  accounted  for  by  the  specific  sets  of  attributes  chosen  to 
represent  the  MAU  structure.  Furthermore,  weights  are  often  learned  In  highly 
uncertain  real  world  environments  In  which  all  factors  that  ultimately  deter¬ 
mine  an  alternative's  overall  worth  are  not  always  know.  Thus,  our  positive 
results  In  the  18%  unaccounted  for  variance  condition  are  suggestive  that  sub¬ 
jective  weights  can  be  obtained  In  complex,  real  world-like  settings. 
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Our  second  specific  result  concerns  the  relative  ability  of  the  differ¬ 
ent  weighting  schemes  to  reproduce  the  correct  rank  ordering  of  attribute 
weights.  The  best  orderings  were  clearly  produced  by  bootstrapping  weights. 
Trading-off  to  the  most  Important  dimension  and  HOPE  yielded  the  greatest 
number  of  Inversions.  This  is  a  puzzling  result.  HOPE  and  bootstrapping 
are  virtually  identical  In  terms  of  the  subjects'  task  requirement  (simple 
holistic  evaluations),  yet  their  performance  was  quite  disparate.  Also, 
trading-off  to  the  most  Important  dimension  and  pricing-out  are  very  similar 
Indifference  procedures,  yet  trading-off  yielded  poorer  orders.  Curiously, 
bootstrapping  was  the  first  order  obtained,  and  pricing-out  and  HOPE  were 
the  last  for  all  subjects.  Although  we  did  not  expect  It  to  be  the  case, 
subjects  may  have  became  bored  with  the  "somewhat  repetitive"  ellclatlon 
procedures,  or  they  may  simply  have  forgotten  the  weight  ratios  learned 
previously.  (This  explanation  Is  most  plausible  for  explaining  the  poorer 
rank  orders  from  HOPE,  the  last  elicitation.  After  making  holistic  evalua¬ 
tions  and  receiving  outcome  feedback  from  a  computer,  the  paper  and  pencil 
method  with  no  feedback  may  have  seemed  substantially  less  glamorous.) 
Although  bootstrapping  did  enjoy  the  Informational  advantage  of  contiguous 
feedback.  It  Is  also  true  that  bootstrapping  weights  are  based  on  30  holistic 
responses,  the  first  29  of  which  are  made  before  the  subject  had  completed 
all  of  the  learning  trials. 

The  third  specific  result  has  been  reported  In  tie  literature  on  sub¬ 
jective  weights  many  times:  judged  weights  were  too  flat.  Although  all  of 
our  procedures  produced  weights  flatter  than  the  "true"  weights,  HOPE  and 
bootstrapping  weights  were  considerably  more  extreme  than  the  others.  Since 
we  conditional! zed  on  only  those  subjects  who  correctly  Identified  the  most 
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Important  dimension,  we  conclude  that  HOPE  and  bootstrapping  yielded  more 
nearly  optimal  weight  spreads  than  did  the  other  techniques.  Thus,  we  seem 
to  have  replicated  previous  findings  that  subjective  (non-hollstlc)  assess¬ 
ment  procedures  produce  too  flat  weights  In  comparison  to  holistic  ones. 

The  final  specific  result  concerns  the  four  methods  we  tested  for  com¬ 
bining  ordinal  assessments  of  attribute'  Importance  with  arithmetic  trans¬ 
formations  of  the  ranks  to  arrive  at  a  weight  vector.  Recall  that  rank-sum 
and  rank-reciprocal  weights  were  based  on  the  rank  order  assessment  alone, 
whereas  rank-exponent  and  decision  rule  rank  weights  combine  rank  order 
Information  with  a  direct  assessment  of  the  weight  on  the  most  Important 
dimension.  (Mr  results  showed  no  advantage  to  the  methods  that  utilize  the 
weight  to  the  most  Important  dlmenlson  assessment.  Since  most  direct  assess¬ 
ments  of  Importance  on  the  most  Important  dimension  were  about  equdl  to  the 
rank-sum  weight  on  the  most  Important  dimension,  both  decision  rule  rank  and 
rank-exponent  weights  were  quite  close  to  rank-sum  weights  for  most  subjects. 
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Footnotes 

1.  Gulllksen  (1950,  p.  316)  assumed  that  the  attributes  were  In  z-score 
form  (mean  zero,  variance  one),  and  McClelland  (Note  4)  proved  a  similar 
theorem  by  assuming  that  the  variances  were  all  equal  to  one.  Both  of  these 
assunptlons  are  overly  strong  In  terms  of  obtaining  Equation  2.  That  the 
attribute  variances  are  equal  Is  a  sufficient  condition. 

2.  The  Fisher  z  transformation  for  the  Pearson  correlation  coefficient 
was  not  applied  In  the  present  report  because  all  correlations  were  calcu¬ 
lated  using  population  parameters  (equal  attribute  variances,  zero  Inter¬ 
correlations,  and  "true"  weights  In  the  exact  ratio  of  8:4:2:1).  Since  our 
matching  scores  are  theoretical  population  values,  there  Is  no  reason  to 
correct  for  biases  In  the  sampling  distribution  of  r.  Had  we  actually 
applied  the  weights  to  a  given  sample  of  diamond  profiles  and  calculated 
the  correlation  between  the  composites,  then  the  z  transformation  would 
have  been  appropriate. 
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Chief,  Strategic  Evaluation  Center 
Central  Intelligence  Agency 
Headquarters ,  Room  2G24 
Washington,  DC  20505 

Director,  Center  for  the  Study  of 
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